-
Notifications
You must be signed in to change notification settings - Fork 24.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a flag to control the concurrent execution of aggregations #96023
Conversation
Pinging @elastic/es-analytics-geo (Team:Analytics) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a couple of questions, looks good!
test/framework/src/main/java/org/elasticsearch/search/aggregations/AggregatorTestCase.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/search/aggregations/AggregationBuilder.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
In elastic#98204 we are introducing unconditional offloading of collection to a separate thread pool, even for requests of phases that don't enable search concurrency. It turns out that some aggs don't support offloading their collection to a separate thread, as their postCollect method is executed on the search thread which trips a lucene assertion around reusing data structures pulled from the search worker thread. With this commit we allow aggs to specify when they don't support offloading their sequential collection. Such aggs are a subset of the ones that already declare that they don't support concurrency entirely. Relates to elastic#96023
In #98204 we are introducing unconditional offloading of collection to a separate thread pool, even for requests of phases that don't enable search concurrency. It turns out that some aggs don't support offloading their collection to a separate thread, as their postCollect method is executed on the search thread which trips a lucene assertion around reusing data structures pulled from the search worker thread. With this commit we allow aggs to specify when they don't support offloading their sequential collection. Such aggs are a subset of the ones that already declare that they don't support concurrency entirely. Relates to #96023
Initial testing has shown that some aggregations are problematic when running on concurrency mode. In order to be able to run aggregations with concurrency, this PR proposes to add a flag that can tell if an aggregation block can be run concurrently. In addition it updates the
AggregationTestCase
to randomly run concurrently. This test has been used to uncover the problematic aggregations.The problematic aggregations fall into two groups:
The issue with this aggregations is that they have a post-collection phase that uses some generated data structures during the collection phase. Because this are run on different threads, we hit a lucene assertion. These aggregations are Cardinality, nested and Composite aggregations.
Label the PR as non-issue as we don't support concurrent searches.